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Abstract 

This  technical  report  presents  the  work  carried  out  at  the  University 
of  Lugano  on  TREC  2014  Federated  Web  Search  track.  The  main  mo¬ 
tivation  behind  our  approach  is  to  provide  better  coverage  of  opinions 
that  are  present  in  federated  resources.  On  the  resource  selection  and 
vertical  selection  steps,  we  apply  opinion  mining  to  select  opinionated  re¬ 
sources/verticals  given  a  user’s  query.  We  do  this  by  combining  relevance- 
based  selection  with  lexicon-based  opinion  mining.  On  the  results  merging 
step,  we  diversify  the  final  document  ranking  based  on  sentiment  using 
the  retrieval-interpolated  diversification  method. 

Keywords:  federated  search,  resource  selection,  vertical  selection,  re¬ 
sults  merging,  sentiment  diversification 

1  Introduction 

This  paper  describes  the  participation  of  the  University  of  Lugano  in  collab¬ 
oration  with  the  University  of  Amsterdam  in  the  TREC  2014  Federated  Web 
Search  track  (FedWebM).1  We  participated  in  three  tasks:  resource  selection, 
vertical  selection  and  results  merging.  Our  aims  are,  first,  to  examine  the  ef¬ 
fectiveness  of  opinion  mining  approaches  for  the  vertical  and  resource  selection 
tasks  and,  second,  to  apply  sentiment  diversification  to  the  results  merging  task 
and  examine  if  this  approach  can  lead  to  better  retrieval  performance. 

Federated  search,  also  known  as  Distributed  Information  Retrieval  (DIR), 
offers  the  means  of  simultaneously  searching  multiple  information  resources2  us¬ 
ing  a  single  search  interface  and  includes  three  phases:  resource  representation, 
resource  selection  and  results  merging  [4,  10]. 

1 https : / /sites . google . com/site/trecf edweb 

“In  this  report,  the  terms  resource  and  search  engine  are  used  interchangeably  to  denote 
a  set  of  documents  that  belong  to  the  same  information  source. 
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The  goal  of  the  FedWeb  track  is  “to  evaluate  approaches  to  federated  search 
at  very  large  scale  in  a  realistic  setting,  by  combining  the  search  results  of 
existing  web  search  engines” .  The  FedWebl4  collection  is  different  from  a  typical 
document  collection  because  it  consists  of  search  results  retrieved  from  149 
different  search  engines,  each  of  which  is  mapped  to  one  vertical  (e.g.,  news, 
sports,  kids,  etc). 

The  FedWebl4  track  focuses  on  three  tasks:  vertical  selection,  resource  se¬ 
lection  and  results  merging.  Vertical  selection  aims  to  identify  the  subset  of 
categories  that  will  give  the  most  relevant  results  given  a  user’s  query.  The  aim 
of  the  resource  selection  task  is  to  identify  a  set  of  the  most  relevant  resources 
given  the  query,  while  in  the  results  merging  task  the  retrieval  results  from  the 
selected  resources  should  be  merged  into  a  single  result  list. 

The  experiments,  described  in  this  technical  report,  aim  to  explore  an  im¬ 
portant  issue:  the  effect  of  considering  opinions  on  different  steps  of  federated 
search.  For  the  resource  selection  task,  we  follow  approaches  that  combine  rele¬ 
vance  and  opinion  [6].  To  calculate  the  topical  relevance  of  resources,  we  apply 
the  widely  used  ReDDE  resource  selection  method  [11].  To  calculate  the  opin¬ 
ionatedness  of  resources,  we  use  the  lexicon-based  approach  that  counts  the 
number  of  SentiWordNet  terms  appearing  in  documents  of  each  resource  [2]. 
For  the  last  step,  i.e. ,  combining  relevance  and  opinion,  we  use  CombSUM  [9]. 

For  the  results  merging  task,  we  apply  sentiment  diversification  to  produce 
the  final  result  which  covers  different  sentiments,  namely  positive,  negative  and 
neutral.  To  this  end,  we  first  retrieve  documents  from  the  top-20  resources,  se¬ 
lected  at  the  resource  selection  phase.  Second,  we  calculate  document  relevance 
scores  based  on  their  ranks  and  relevance  scores  of  corresponding  resources  as 
in  [8].  Third,  we  apply  the  retrieval-interpolated  framework  [1]  to  diversify 
results  by  their  sentiments. 

In  this  year’s  track,  organisers  introduced  a  new  task:  vertical  selection. 
In  this  task,  participants  are  asked  to  predict  relevant  verticals  (such  as  news, 
sports,  etc.)  given  a  user’s  query.  For  this  task,  we  simply  used  the  ranking  of 
resources,  produced  on  the  resource  selection  phase,  and  the  mapping  between 
these  resources  and  corresponding  verticals  to  produce  our  results. 

The  rest  of  the  report  is  organised  as  follows.  In  Section  2  we  detail  our 
approach  for  resource  selection,  vertical  selection  and  results  merging  tasks.  In 
Section  3  we  describe  our  experimental  setup  and  report  results.  Section  4 
concludes  our  report. 

2  Opinions  in  Federated  Search 

2.1  Resource  Selection 

When  a  user  submits  a  query  to  a  federated  search  system,  resource  selection 
aims  to  identify  the  most  relevant  resources  that  will  further  process  the  query. 
For  the  resource  selection  task  in  FedWebl4,  participants  are  given  a  set  of 
queries,  a  set  of  search  engines/resources  and  a  set  of  sample  documents  for 
each  resource.  For  each  query,  participants  are  asked  to  return  a  ranked  list  of 
search  engines  according  to  their  relevance  to  the  query.  Our  approach  to  the 
resource  selection  task  focuses  on  identifying  resources  that  are  both  relevant 
to  a  query  and  contain  opinion. 
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In  our  experiments,  we  apply  the  widely  used  RcDDE  resource  selection 
technique  [11]  to  produce  the  ranking  of  the  resources.  In  particular,  for  every 
query  q  we  first  calculate  retrieval  scores  s(d\q)  for  documents  contained  in  the 
centralized  sample  index  (CSI).  To  build  CSI,  we  use  documents  sampled  from 
149  search  engines  using  a  set  of  4000  queries  (sample  documents  are  provided 
by  the  organisers).  We  use  the  DFR_BM25  retrieval  function  from  Terrier3  be¬ 
cause  it  showed  slightly  better  results  compared  to  other  unsupervised  retrieval 
approaches.  Then  the  score  of  resource  R  is  calculated  as  follows: 


s(R\q) 


m 


(i) 


where  m  is  the  number  of  documents  in  CSI  that  were  sampled  from  resource  R. 

In  order  to  calculate  the  opinion  score  of  resource  R,  we  aggregate  opinion 
scores  of  documents  belonging  to  this  resource.  In  particular,  the  opinion  score 
of  resource  R  is  calculated  as: 

°(R)  =  (2) 


where  o(d)  is  the  opinion  score  of  document  d  and  \R\  is  the  number  of  docu¬ 
ments  sampled  from  resource  R.  The  opinion  score  of  a  document  is  calculated 
as  the  expected  opinion  score  of  its  terms: 


°(d)  =  £o(i)p(i|d) 


(3) 


ted 


where  p(t\d)  is  the  relative  frequency  of  term  t  in  document  d  and  o(t)  is  the 
sentiment  of  the  term  obtained  from  a  pre-built  lexicon.  The  relative  frequency 
of  term  t  is  calculated  as: 

pm  =  ^  <4) 

where  tf(t,d )  denotes  the  number  of  occurrences  of  term  t  in  document  d  and 
|d|  denotes  the  total  number  of  words  in  the  document. 

In  order  to  produce  the  final  ranking  of  resources,  we  need  to  combine  their 
relevance  and  opinion  scores.  To  do  this,  we  used  the  CombSUM  data  fusion 
method  [9]: 

Sfinal[R\c[)  —  Snorm  (R\q)  +  onorm(R)  (5) 

where  snorrn{R\q)  and  onorm(R )  are  MinMax-normalized  relevance  and  opinion 
scores  of  resource  R  respectively. 


2.2  Vertical  Selection 

In  web  search,  a  query  is  associated  with  a  set  of  verticals  each  of  which  focuses 
on  specific  domains  (e.g.,  news,  travel,  and  sports)  or  media  types  (e.g.,  images, 
videos).  For  the  vertical  selection  task  of  FedWebl4,  participants  are  given  a 
set  of  verticals  and  the  mapping  from  resources  to  verticals.  Each  search  engine 
is  associated  with  one  category,  such  as  web,  news,  travel,  video,  etc.  In  order 
to  identify  the  category  of  a  query,  we  use  the  provided  mapping  and  our  results 
from  the  resource  selection  task.  Given  those,  we  assume  that  if  a  search  engine 
is  selected  as  relevant  for  a  given  user’s  query,  then  the  category  (vertical)  of 
this  engine  can  also  be  a  category  of  the  query. 

Tttp:  //terrier .  org 
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2.3  Results  Merging 

Given  a  set  of  most  relevant  resources  produced  on  the  resource  selection  phase 
and  their  retrieval  results,  results  merging  aims  to  combine  those  results  into  a 
single  list.  The  results  merging  task  in  FedWebl4  considers  documents  retrieved 
from  the  top-20  resources.  Our  approach  to  results  merging  aims  to  diversify 
the  final  result  list  to  cover  different  sentiments,  namely  positive,  negative  and 
neutral.  To  this  end,  we  consider  both  relevance  and  opinion  scores  of  documents 
when  creating  the  final  merged  list. 

For  each  query  and  for  each  resource,  the  organisers  provide  a  ranked  list 
of  documents.  However,  document  relevance  scores  are  not  available.  To  ap¬ 
proximate  relevance  scores  s(cZ|<7)  for  documents  from  resource  R  we  transform 
corresponding  document  ranks  r{d\q)  as  follows: 

s(d\q)  =  r{dlq)s(R\q)  (6) 

n 

where  n  is  the  number  of  documents  in  the  result  list  produced  by  R  and  s(i?|g) 
is  the  resource  selection  score  of  R. 

For  the  sentiment  diversification  step  we  follow  the  retrieval-interpolated 
diversification  approach  [1],  More  specifically,  we  apply  an  adaption  of  the 
sentiment-contribution- by-strength  model  (SCS).  According  to  SCS,  we  first 
need  to  calculate  the  sentiment  of  each  document.  We  do  this  using  a  lexicon- 
based  approach  and  the  SentiWordNet  lexicon  [2].  In  particular,  we  calculate 
the  sentiment  of  a  document  as  the  expected  sentiment  of  its  terms: 

sent(d)  =  sent(t)p(t\d)  (7) 

ted 

where  p(t\d)  is  the  relative  frequency  of  term  t  in  document  d  (see  Equation  (4)) 
and  sent(t)  is  the  dictionary  sentiment  of  t  as  given  by  SentiWordNet.  The 
sentiment  score  sent(d)  ranges  from  —1  to  1,  where  documents  with  sent(d)  G 
[—1,0)  are  considered  negative  in  terms  of  opinion,  with  sent(d)  =  0  —  neutral 
and  with  sent(d)  €  (0, 1]  -  positive. 

After  calculating  relevance  and  sentiment  scores  for  all  documents  returned 
by  selected  resources,  we  merge  these  documents  into  a  single  list  C  by  itera¬ 
tively  adding  documents  to  the  final  list.  Here,  every  next  document  d*  should 
maximize  the  following  function: 

d*  =  argma xd(snorm(d\q)  +  sent1  (d))  (8) 

where  snorm(d\q)  is  the  MinMax-normalized  document  relevance  score  and  sent1  (d) 
is  calculated  as  follows: 

sent'  (d)  =  |  sent  (d)  |  (1  —  \sent(d')\)  (9) 

d'ec 

d! of  same  sent. 

where  |  •  |  is  the  abs  function  and  the  product  is  performed  over  documents 
already  added  to  the  final  list  C,  which  are  of  the  same  sentiment  as  docu¬ 
ment  d.  Essentially,  this  equation  promotes  documents  with  a  high  sentiment 
score  sent(d)  and  with  sentiment,  that  has  low  probability  in  C. 
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3  Experiments 

3.1  Tasks 

The  TREC  2014  Federated  Web  Search  track  proposed  three  tasks: 

•  Vertical  selection:  given  a  query  and  a  set  of  verticals,  the  goal  of  this  task 

is  to  select  a  subset  of  relevant  verticals.  In  FedWeb  2014,  participants 
are  given  24  different  verticals  (e.g.,  news,  blogs,  videos  etc). 

•  Resource  selection:  given  a  query,  a  set  of  search  engines/resources  and 

a  set  of  sample  documents  for  each  resource,  the  goal  of  this  task  is  to 
return  a  ranked  list  of  search  engines  according  to  their  relevance  given 
the  query. 

•  Results  merging:  given  a  query,  the  top-20  resources  selected  on  the  re¬ 

source  selection  phase  and  their  retrieval  results,  the  goal  is  to  merge 
these  results  into  a  single  list. 

3.2  Experimental  Setup 

FedWebl4  contains  a  collection  of  search  results  sampled  from  149  search  engines 
obtained  between  April  and  May  2014.  We  used  Terrier  to  index  this  collection, 
thus,  creating  CSI.  For  the  lexicon-based  opinion  mining  methods  we  tried  the 
following  opinion  lexicons:  AmazonKLE,  SentiWordNet  and  MPQA.  Based  on 
the  experiments  with  the  FedWebl3  dataset4  we  decided  to  use  SentiWordNet 
due  to  its  superior  performance.  To  calculate  retrieval  scores  of  documents  in 
CSI,  we  considered  the  following  scoring  functions  from  Terrier:  BM25,  DLH13, 
Dirichlet  Language  Model  and  DFR_BM25.  The  experiments  on  FedWebl3 
showed  that  DFR_BM25  produces  the  highest  MAP,  so  we  used  this  scoring 
functions  in  our  runs. 

We  submitted  three  runs  for  the  resource  and  vertical  selection  tasks.  One 
run  does  not  consider  opinion  (ULuganoDFR)  while  the  other  two  runs  do 
(ULuganoColL2  and  ULuganoDocL2).  In  ULuganoColL2  we  rerank  resources 
considering  both  their  relevance  and  opinion,  while  in  ULuganoDocL2  docu¬ 
ments  from  CSI  are  reranked  according  to  their  opinion  before  resource  selection 
is  performed. 

Four  runs  were  submitted  for  the  results  merging  task.  Two  runs  included 
sentiment  diversification  while  the  other  two  not.  In  the  ULugDFRNoOp  and 
ULugDFROp  runs  the  search  engine  ranking  was  obtained  from  the  ULugan¬ 
oDFR  resource  selection  run.  The  ULugFWBsNoOp  and  ULugFWBsOp  runs 
exploited  the  baseline  resource  selection  run  provided  by  TREC. 

The  tasks  were  performed  on  a  set  of  50  queries  provided  by  FedWebl4.  The 
effectiveness  of  vertical  selection  is  evaluated  by  standard  classification  metrics: 
precision  (P),  recall  (R)  and  F-measure  (FI).  The  resource  selection  task  is 
evaluated  by  the  normalized  discounted  cumulative  gain  (nDCG),  the  variant 
introduced  in  [3]  and  the  normalized  precision  (nP)  introduced  in  [5].  The  main 
metric  for  the  results  merging  is  nDCG. 

4 http: // snipdex . org/datasets/f edweb2013 
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3.3  Results 


The  results  for  the  vertical  selection  task  are  reported  in  Table  1,  for  the  resource 
selection  task  in  Table  2  and  for  the  results  merging  task  in  Table  3.  The 
difference  between  nDCG@100  and  nDCG@100_local  is  that  the  latter  assumes 
that  only  the  top-20  selected  resources  contain  relevant  documents. 


Table  1:  Results  for  vertical  selection  runs. 


Run 

P 

R 

FI 

ULuganoDFR 

0.117 

0.983 

0.197 

ULuganoColL2 

0.117 

0.983 

0.197 

ULuganoDocL2 

0.117 

0.983 

0.197 

Table  2:  Results  for  resource  selection  runs. 


Run 

nDCG@20 

nP@5 

ULuganoDFR 

0.304 

0.164 

ULuganoColL2 

0.297 

0.158 

ULuganoDocL2 

0.301 

0.160 

Table  3: 

Results  for  results  merging  runs. 

Run 

nDCG@20 

nDCG@20 

nDCG@100_local 

ULugDFRNoOp 

0.156 

0.204 

0.362 

ULugDFROp 

0.146 

0.195 

0.346 

ULugFWBsNoOp 

0.251 

0.296 

0.588 

ULugFWBsOp 

0.224 

0.273 

0.545 

Table  1  shows  that  all  our  approaches  to  vertical  selection  perform  the  same. 
This  can  be  explained  by  the  fact  that  we  did  not  set  any  thresholds  on  the 
number  of  selected  resources  and/or  verticals,  so  our  vertical  selection  methods 
suggested  a  large  number  of  verticals  (on  average,  17  verticals  out  of  24).  This 
is  the  main  reason  for  high  recall  and  low  precision  of  our  vertical  selection 
approaches. 

Tables  2  and  3  show  the  results  on  resource  selection  and  results  merging 
respectively.  The  results  show  that  there  is  no  significant  difference  between 
the  methods  that  apply  opinion  mining  or  sentiment  diversification  in  federated 
search  and  the  baselines.  This  was  not  unexpected  since  the  topics  provided  by 
FedWebl4  are  not  chosen  in  respect  of  their  relevance  to  opinionated  documents. 
On  the  other  side,  it  could  be  the  case  some  topics  to  ask  for  opinionated 
documents  even  if  this  is  not  required  in  this  track.  Having  this  in  mind,  FedWeb 
dataset  seemed  appropriate  for  our  experiments  as  it  provides  the  federated 
environment  on  which  we  could  incorporate  opinions  in  federated  search. 
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Previously,  sentiment  diversification  was  mainly  applied  to  controversial  top¬ 
ics  which  required  opinionated  documents  to  appear  in  retrieval  results  [7].  For 
such  topics  presenting  different  viewpoints  is  important  and,  therefore,  senti¬ 
ment  diversification  usually  performs  well  [1]. 

To  verify  the  above  hypothesis,  we  applied  sentiment  diversification  to  re¬ 
sults  merging  on  the  FedWebl3  dataset  with  available  relevance  judgements  and 
topics’  descriptions.  Table  4  shows  results  for  a  subset  of  topics  from  FedWebl3. 
The  descriptions  of  these  topics  are  given  in  Table  5.  It  can  be  seen  that  these 
topics  require  documents  with  opinion.  From  the  results  in  Table  4,  we  observe 
that  our  approach,  which  diversifies  the  final  result  list  by  sentiment,  performs 
better  than  the  baseline  for  these  topics,  proving  that  sentiment  diversification 
should  be  used  for  controversial  queries. 


Table  4:  nDCG@20  for  a  subset  of  topics  from  FedWebl3. 


Topics 

7007 

7084 

7109 

7415 

Baseline(No  Opinion) 
Diversified  By  Sentiment 

0.461 

0.497 

0.847 

0.854 

0.659 

0.745 

0.253 

0.331 

Table  5:  Topic  descriptions. 


Topic  Description 

7007  You  are  looking  for  a  thorough  text  review  of  Howl  from  Allen  Ginsberg. 
7084  You  want  to  read  some  reviews  about  the  movie  ’burn  after  reading’. 
7109  You  are  in  New  York,  and  are  looking  for  a  place  to  eat  pho. 

7415  You  want  to  know  which  are  this  year’s  most  anticipates  games. 


4  Conclusions 

In  this  paper,  we  described  our  participation  in  the  TREC  2014  Federated  Web 
Search  track.  For  the  resource  selection  and  vertical  selection  tasks,  we  proposed 
to  combine  topical  relevance  with  opinion  and  used  a  lexicon-based  approach 
to  calculate  the  opinionatedness  of  resources/verticals.  For  the  results  merging 
task,  we  used  retrieval-interpolated  diversification  to  provide  a  comprehensive 
overview  of  various  opinions  in  the  merged  result  list. 

The  results  of  our  participation  in  FedWebl4  did  not  manage  to  support  the 
claim  that  applying  opinion  mining  and  sentiment  diversification  to  federated 
search  can  lead  to  a  better  performance.  This  can  be  explained  by  the  fact 
that  topics  in  the  FedWebl4  collection  were  not  chosen  for  an  opinion-related 
task  and,  therefore,  did  not  require  retrieving  documents  with  opinion.  On  the 
other  hand,  FedWebl3  contains  few  topics  that  ask  for  opinions  and,  therefore, 
our  methods  could  improve  performance  for  those  topics.  We  believe,  this  is  a 
promising  result  which  requires  further  investigation. 
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